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1. INTRODUCTION 

Classification is one of the important area in data mining. The main process in classification are pre- 
processing and feature selection. These process may influence the classification performance. There are 
many ways to perform pre-processing and feature selection including discretization during pre-processing 
step. Discretization is the process to transform the continuous value into integer value or known as 
discrete [1]. The techniques for discretization can be categorized in two perspectives, supervised and 
unsupervised [2]. The goal of data discretization is to determine the best set of break point to group the data. 
Break point is a limit of an interval of integer values. In classification, the goal of discretization is to make a 
better classification by yielding higher accuracy [3]. Discretization can also be useful for data cleansing tasks 
including missing value imputation and corrupt data detection (CDD). From Garcia-Gil in 2018 [4], 
have propose a new ensemble method based on PCA for the dimensionality reduction step and Random 
Discretization. Discretization also used in missing value imputation techniques such as FIMUS [5] and CDD 
techniques such as CAIRAD [6] rely on discretization algorithms. Some data mining techniques require 
discrete data rather than continuous data and it is very important to have algorithms for discretization of 
continuous data [7]. 

Discretization algorithms have been intensively researched and have been developed embedded with 
various techniques. The use of discretization algorithm has been in different purpose and application. 
There are many algorithm are used to perform discretization such as statistical technique based on Chi-square 
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statistic [8], heuristic algorithm [9, 10] and K-Means classifier [11]. K-Means classifier with Rough set was 
used by Xing et al. in [12] to transform continuous data into continuous data. Tahir et al. in [13] using K- 
Means in discretization technique to proposed method that able to generate better detection rate and accuracy. 

Feature selection can lead to getting good classification accuracy. Feature selection is a process to 
identify the relevant feature in dataset. After the process, selected features might be possibility half of the 
size of the original feature dataset. Bat Algorithm (BA) is used to find the optimal features in [14, 15]. Since, 
BA is a powerful technique to optimize the number of features, this algorithm is combined with another 
technique such as rough set [16, 17]. BA, is one of the Swarm Intelligent algorithm initiated by Yang that 
more powerful compare to the other evolutionary algorithm [18]. 

Nowadays, many researchers have extended the original BA to solve discrete problem such as [19] 
in fault system diagnosis [20], Traveling Salesman Problem (TSP) [21]. Krause et. al. [22], also applied bat 
to handle discrete problems. In 2014, Nakamura et al. [23], introduced binary bat algorithm to solve feature 
selection problems. The main idea is to associate each bat a set of binary coordinates that denote whether a 
feature will belong to the final set of features or not. Since the problem is to select or not a given feature, the 
bat’s position is then represented by binary vectors. Enache and Sgarciu in [24] proposed a feature selection 
technique in transforming the continuous data into discrete data for instruction detection application. 

The objective of this study is to investigate efficiency of BA in discrete dataset and to find the 
optimum feature in discrete dataset. In order to prove the efficiency of BA, this study proposed one technique 
that comprise the discretization technique and feature selection technique have been proposed. 
Our contribution in two process of classification. First, we contribute in pre-processing process by look the 
benefit of K-Means and BA technique. BA is an optimization technique which is used to select the optimal 
feature from a dataset. But, the drawback of BA is not capable to convert data into discrete. Since, K-Means 
classifier has the ability to discretize the data, BA and K-Means are combined as a discretization technique, 
known as BkKMD. Second, we apply BA as feature selection technique called as BKMDFS. BkKMDFS is one 
technique that comprises discretization technique (BKMD) and feature selection technique (BA). 

The proposed techniques, BKMD BkMDFS are evaluated and compared with the existing techniques 
by applying them on simulated in various application datasets. The result show that the proposed techniques 
leads to considerably better accuracy results. The merit of this study is heuristic and random search that based 
on an objective criterion to find the optimize feature in dataset. 

Literature review that has been done author used in the chapter "Introduction" to explain the 
difference of the manuscript with other papers, that it is innovative, it are used in the chapter "Research 
Technique" to describe the step of research and used in the chapter "Results and Discussion" to support the 
analysis of the results. The conclusion will be discussed in the chapter “Conclusion”. 


2. RESEARCH TECHNIQUE 
2.1. Dataset and Experimental Settings 

There are 14 datasets were used to validate study. Features in these datasets are of numerical only 
which are each dataset must have more than half continuous features. The experiments were conducted using 
14 datasets from different applications; Credit Approval (Dsl), Ecoli (Ds2), Hill Valley (Ds3), 
Image Segmentation (Ds4), Libras Movement (Ds5), Plant Species (Ds6), Steel Plates Faults (Ds7), Urband 
Land (Ds8), Automobile (Ds9), Abalone (Ds10), Yeast (Ds11), Waveform (Ds12), Ionosphere (Ds13) and 
Water Treatment (Ds14). These datasets called as original datasets (ODS) that obtained from UCI Machine 
Learning Repository [UCI] that public available for research. Number of instances in dataset ranges from 159 
to 5000. Dataset vary from 8 to 100 features. All information about the datasets used in this study are shown 
in Table 1. Meanwhile, Figure | illustrates framework BkKMDFS to demonstrate how this technique can 
incorporate the discretization approach and feature selection technique to improve the classification 
performance. Thus, this study can be summarized in three process as follows: 
Discretization: This is the initial process that transform original datasets into discrete datasets. 
Two discretization techniques are used; K-Means and BkMD. Each discretization technique is applied to 
each dataset. At the end of discretization process, two discretize datasets are created. Now, three group of 
dataset will be used in the next process. 
a) Original dataset called as ODS. 
b)  Discretized dataset with K-Means as discretization technique called as KMDS. 
c) Discretized dataset with BkMD as discretization technique called as BKMDS. 
Feature selection: This process will select the optimum feature in each dataset. Two techniques are used 
which are Information Gain (IG) and BA. 
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Classification: In this paper, accuracy and sensitivity performance of classifiers over selected features has 
been made to compare the discretization and feature selection techniques by comparing their performances 
on original and discrete datasets. Two classifiers are used which are Naive Bayes (NB) and k-Nearest 
Neighbor (kNN). Each classifier is applied to each dataset. 


Table 1. Dataset Information’s 








# Datasets Number Of Instances Number Of Attributes 
1, Credit Approval 690 15 
2. Ecoli 336 8 
3; Hill_Valley 606 100 
4. Image Segmentation 210 19 
5. Libras Movement 360 90 
6. Plant Species 1600 64 
qe Steel Plates Faults 1941 27 
8. Urban Land 507 147 
9, Automobile 159 25 
10. Abalone 4177 8 
11. Yeast 1484 8 
12. Waveform 5000 21 
13. Ionosphere 351 34 
14. Water Treatment 523 38 





Performance Evaluation 
| > of Different Classifiers 





Figure |. Framework of BKMDFS 


2.2. Bat-K-Means Discretization Technique, BkKMD 
In BkMD, each of bat flies randomly with a velocity v,, at position also known as solution x, with 


varying frequency or wavelength and loudness A, . Generally, the frequency can be described as (1).\ 


f = Soin + (fax ~ Fin Bs (1) 


where {E [0,1]is a random number drawn from a uniform distribution and /f e [0,2] is a minimum and 
maximum value of frequency. 
Therefore, the velocities v, and location x can be updated according to the following 


(2) and (3), respectively. 
aa. +(v4 ~v. Jf, (2) 
xi=xil+vi, (3) 


In this research, the initial value for A , is 0.5. The aim of this technique is to searched and find its 
pray denote as feature that contribute to higher classification performance by changing the frequency, 
loudness and pulse emission rate r by assumption the best solution is the bat can find the best feature in 
order to select the optimal feature from dataset. BA is used to generate a set of centroid, 


C, = (c, pages 36) for the each dataset that denote by (4). 
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N 
Fitness = yw? (4) 
1 


where U denote the data in dataset and WN attribute called as feature. 

BA only handle in finding the optimize features. But, in discretization the essential thing is 
revolving the continuous domain into corresponding integer domain. Discretization is to set up the discrete 
range interval called as breakpoints for continuous attributes values in dataset. According to these 
breakpoints, the continuous attributes values are assigned into integer value such as 0,1,2 and 3. 

K-Means was described by J. MacQueen in 1967 [26]. K-Means is an iterative algorithms that begin 


with a set of k reference point called as centroid [13]. First, data are partitioned into k clusters. A data point 


X become a member of cluster k -th if kKcis a centroid reference closet toX. The positions of centroid 
reference and the assignment of the data points to cluster is iterative computation and will be repeated until 
the optimal solution is obtained. K-Means can be define as (5). 


K 2 
> Dk - el (5) 


k=lieS, 


where K is the number of class k , which G, is a set of group, G,= (ee ee k,c is the centroid for 
corresponding class, k -th. Xx ,18 J -th data point. 


The strength of K-Means is able to transform from continuous to discrete value. However, K-Means 
not able to find the optimize features in the respective dataset. So, BA is used in order to cater this problem. 
In other word, K-Means and BA are compliment to each other’s. 


2.3. BA Feature Selection Technique 

Each attribute in dataset correspond to feature in BA and each instance in dataset correspond to 
number of Bat in BA. Each instance or Bat have a set of attributes or features. Each feature has their own 
position that represented in binary bit strings with N -length, where N is the total number of features. 
Each bits denotes as the feature which is ‘1’ represent the selected feature and ‘0’ represent unselected 
feature. The objective of BA is to find the Bat with minimum global best. Each bat will have their own local 
best and a set of binary string. Thus, a set of binary string with minimum local best be winner. This winner is 
called as global best. It means, the selected feature is corresponding to global best. For instance, from 
Table 2, there 4 bats in BA correspond to 4 instances in dataset and 5 features in BA correspond to 5 
attributes in dataset. Each bat will have their own local best. Then, among the 4 local best, the local best = 
0.111 is the minimum local best from bat number 2. So, the winner bat is bat number 2 with global best, 
gb = 0.111 and the selected feature, fs = {f2, f4, f5}. 


Table 2. Selected Features 
# of features, f 1 2 3 4 5 Local best 








# of bat 
1 1 0 1 0 0 0.134 
2 0 1 0 1 1 0.111 
3 1 1 1 1 1 0.345 
4 1 1 0 0 0 0.543 





2.4. Classification 

Classification process are used to measure the quality of selected feature from original dataset and 
discretize dataset. There are two supervised classifiers used: Naive Bayes (NB) and k-Nearest Neighbor 
(KNN) with two performance measures accuracy and sensitivity. Each classifiers is applied to each dataset. 


2.5. Performance Evaluation 

In this study, we propose BkKMDFS that comprises discretization technique (BkKMD) and feature 
selection technique (BA). To evaluate BKMDFS, we used two classifiers: Naive Bayes (NB) and k-Nearest 
Neighbor (KNN) with two performance measures accuracy and sensitivity. Furthermore, as evident the 
capability of BA as discretization and feature selection techniques, there 6 other techniques as shown in 
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Table 3 are employed. DPSO-RS was proposed by Noorhaniza in 2010 [25]. DPSO-RS was developed in 
order to improve PSO to cater discrete data DPSO-RS is hybrid with Rough Set to find the optimal feature 
selection problem. Each dataset is applied to each technique in Table 3. We have seven set of selected feature 
for each dataset, totally 98 set selected features. Then, classification process are used to measure the quality 
of selected features. 


Table 3. Comparison Techniques 








Technique Description 

BAO Bat algorithm as a feature selection method and using ODS datasets. 

IGO Information Gain as a feature selection method and using ODS datasets. 

kMB Bat algorithm as a feature selection method using kMDS datasets. 

kMIG Information Gain as a feature selection method using kMDS datasets. 
BkMDFS Bat algorithm as a feature selection method using BkKMDS datasets. 
BkMDIG Information Gain as a feature selection method using BkMDS datasets. 
DPSO-RS Discrete Particle Swarm Optimization — Rough Set that using ODS datasets. 





3. RESULTS AND ANALYSIS 

The experiments conducted to assess the robustness of proposed technique BKMDFS that comprised 
the discrete technique and feature selection technique together. 14 datasets from various domains have been 
used for experimental evaluation as shown in Table 1. 

Table 4 shows the accuracy for Naive Bayes and k-Nearest Neighbor classifiers. BKMDFS 
techniques outperform 5 out of 14 datasets which are Dsl, Ds3, Ds5, Ds10 and Ds12 of the Naive Bayes 
classifier. Meanwhile, BKMDFS technique only outperform in 4 datasets compare to IGO 7 datasets of the k- 
Nearest Neighbor classifier. 

Table 5 shows the sensitivity for Naive Bayes and k-Nearest Neighbor classifiers. Bk MDFS 
techniques outperform 4 out of 14 datasets which are Ds1, Ds3, Ds5 and Ds12 of the Naive Bayes classifiers. 
BkMDFS technique only outperform in 4 datasets compare to IGO 5 datasets for the k-Nearest 
Neighbor classifier. 

From Figure 2, we focus results on the best discretization technique for each dataset, best classifier 
for each dataset and best combination of both for each type of dataset. As evident shows in Figure 2, 
BkMDFS outperform in all performance measure for Naive Bayes classifier in discretize dataset. 
But, in sensitivity, BKMDFS also outperform in k-nearest neighbor classifier. 


Table 4. The Performance of Accuracy 











Naive Bayes k-Nearest Neighbor 
Dataset BAO IGO kMB kMIG. BkMDFS BkMDIG DPSO-RS BAO IGO kKMB kMIG BkMDFS BkMDIG DPSO-RS 
Ds1 0.628 0.783 0.733 0.752 0.852 0.842 0.808 0.609 0.827 0.733 0.756 0.809 0.849 0.804 
Ds2 0.694 0.962 0.494 0.656 0.776 0.656 0.962 0.704 0.932 0.597 0.689 0.721 0.689 0.928 
Ds3 0.515 0.519 0.253 0.253 0.522 0.253 0.519 0.589 0.584 0.496 0.496 0.59 0.496 0.586 
Ds4 0.619 0.781 0.651 0.599 0.732 0.603 0.77 0.771 0.894 0.629 0.651 0.847 0.651 0.886 
Ds5 0.626 0.647 0.479 0.371 0.661 0.345 0.65 0.816 0.865 0.638 0.591 0.871 0.412 0.867 
Ds6 0.785 0.849 0.086 0.079 0.788 0.372 0.861 0.644 0.753 0.09 0.083 0.666 0.242 0.747 
Ds7 0.616 0.667 0.505 0.54 0.64 0.623 0.616 0.642 0.713 0.55 0.566 0.724 0.575 0.679 
Ds8 0.761 0.83 0.332 0.169 0.824 0.56 0.613 0.751 0.816 0.251 0.127 0.759 0.507 0.673 
Ds9 0.486 0.562 0.091 1 0.494 0.507 0.569 0.717 0.855 0.091 1 0.786 0.557 0.799 
Ds10 0.182 0.197 0.076 0.114 0.198 0.181 0.192 0.19 0.199 0.062 0.138 0.205 0.224 0.197 
Ds11 0.517 0.591 0.239 0.233 0.544 0.421 0.514 0.487 0.524 0.33 0.195 0.497 0.532 0.504 
Ds12 0.831 0.841 0.653 0.711 0.85 0.477 0.832 0.745 0.785 0.58 0.684 0.896 0.69 0.764 
Ds12 0.788 0.842 0.784 0.813 0.855 0.862 0.845 0.896 0.871 0.829 0.809 0.867 0.863 0.866 
Ds14 0.613 0.739 0.351 0.56 0.725 0.56 0.723 0.563 0.699 0.326 0.522 0.633 0.522 0.679 





Table 4. The Performance of Sensitivity 











Naive Bayes k-Nearest Neighbor 
Dataset | BAO IGO kMB kMIG BkMDFS BkMDIG DPSO-RS BAO IGO kKMB kMIG BkMDFS BkMDIG DPSO-RS 
Ds1 0.596 0.765 0.733 0.752 0.852 0.842 0.803 0.61 0.828 0.733 0.757 0.809 0.846 0.803 
Ds2 0.735 0.973 0.565 0.67 0.786 0.67 0.973 0.711 0.932 0.64 0.696 0.735 0.696 0.935 
Ds3 0.505 0.507 0.503 0.503 0.508 0.503 0.507 0.589 0.584 0.497 0.495 0.591 0.495 0.586 
Ds4 0.619 0.786 0.6 0.571 0.729 0.605 0.776 0.771 0.895 0.586 0.638 0.848 0.648 0.886 
Ds5 0.606 0.631 0.464 0.367 0.647 0.317 0.636 0.811 0.861 0.622 0.556 0.864 0.381 0.861 
Ds6 0.774 0.842 0.084 0.094 0.781 0.357 0.851 0.639 0.739 0.083 0.112 0.658 0.228 0.737 
Ds7 0.536 0.604 0.457 0.511 0.58 0.509 0.548 0.641 0.714 0.552 0.558 0.726 0.574 0.679 
Ds8 0.75 0.821 0.369 0.232 0.815 0.589 0.548 0.738 0.804 0.25 0.232 0.738 0.476 0.672 
Ds9 0.453 0.566 0.302 1 0.509 0.409 0.572 0.698 0.836 0.302 1 0.774 0.56 0.774 
Ds10 0.233 0.241 0.156 0.155 0.236 0.204 0.232 0.192 0.201 0.053 0.155 0.207 0.258 0.198 
Ds11 0.53 0.579 0.337 0.336 0.551 0.371 0.531 0.486 0.522 0.342 0.337 0.499 0.524 0.504 
Ds12 0.814 0.81 0.551 0.649 0.84 0.368 0.806 0.746 0.785 0.584 0.685 0.889 0.691 0.764 
Ds12 0.798 0.826 0.775 0.803 0.843 0.863 0.829 0.895 0.863 0.821 0.798 0.86 0.863 0.858 
Ds14 0.62 0.734 0.304 0.52 0.717 0.505 0.721 0.585 0.713 0.509 0.514 0.652 0.528 0.694 
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Figure 2. Number of dataset where each discretization technique yield the best 


4. CONCLUSION 

The objectives in this study is to investigate efficiency of the BA as a discretization and feature 
selection techniques. However, the drawback of BA is not able to transform the continuous into discrete 
value. So, we employ K-Means to solve this problem and proposed the discretization technique namely, 
BkMD. Then, to prove the capability BA as feature selection, we integrate BA with BkKMD and known as, 
BkMDFS. As evident show the effectiveness of our proposed techniques, we run the experiment in 14 
datasets from various applications. Two classifier are used to measure the quality of selected feature with 
BkMDFS which are Naive Bayes and k-Nearest Neighbor. 

In this study, the experiment has demonstrated that the discretization and Bat algorithm capable to 
correctly predict a class membership. BKMDFS able to improve classification performance (accuracy and 
sensitivity) in discretize dataset. Even though BkKMDFS outperform in most of the comparison technique, but 
this technique still cannot compete with benchmark technique, information gain (IG). Thus, BKMDFS have a 
room for improvement in future research. 
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